Friday, December 21, 2007

Why python?

The author of the first comment on the Quameon entry wonders if Python will be too slow.


One reason to use Python is ease of programming. Hopefully the advantages of programming in Python will outweigh the performance penalty.


However, at some point, runtime speed will be an issue. Here's a list of steps to improve runtime performance (in increasing order of effort/expected return)


  1. Use psyco. Put "import psyco; pysco.full()" at the beginning of your program and get an instant speed increase.
  2. Use the Shed-Skin Python to C++ compiler. (I haven't tried this yet - don't know if it works with Quameon.)
  3. Profile and rewrite slow parts in C/C++/Fortran. I've done this with the inter-particle distance computation in a classical MC code in Python, and gained some performance.
  4. Code generation. Right now, the code for the Gaussian basis functions is auto-generated (no more computing derivatives by hand!!). I would like to move more of the basic formulas to this technique. Then it should be easier to retarget the code generation to another language (in theory, I haven't tried it yet).


Other tidbits:

Monte Python uses Python and C++ for QMC (paper, code). They wrote the time-consuming parts in C++, and use Python to glue it all together. The algorithm variants they tested (parallel distribution strategies for DMC) were achieved through changing the Python part of the code.


Code generation could potentially result in *faster* code than writing a general framework in C++ or Fortran. This is because the code can be specialized for a given physical system, exposing more optimization opportunities. For an example of code generation, look at the development SVN area of PyQuante - there is a program that will produce a C++ program to compute the energy for a molecule given in the input file.

Wednesday, December 05, 2007

Quameon

I've been working on a Quantum Monte Carlo code implemented in Python. It's called "Quameon" (the name is a mixture of letters from "Quantum Monte Carlo in Python"). There's a SourceForge project for it - here are links to the SourceForge project page, the project web page, and the project wiki.

The code doesn't do much currently, but I mention the project now so I can write blog entries about various aspects of getting the code working.


I'm currently trying to validate that the code works by using HF single particle orbitals (obtained from PyQuante) and no jastrow factor - this should reproduce the HF energy. It seems to work for a few atoms (He, Be, C), after sorting out some normalization issues with the basis sets (apparently all basis sets are not normalized - this doesn't affect the energy, but does affect the eigenvector coefficients). I will need longer runs to be sure. The next step is to profile the code to look for any quick and easy performance tuning opportunities.

The first target (after getting the code running reliably) is to implement several forms for jastrow factors and several VMC parameter optimization algorithms.

Thursday, October 04, 2007

Keeping notes

Keeping a record of work is important for research. I use two main tools for this purpose. First, a text file with entries in chronological order (one file per month, with names like "Sept2007"). I write the date and then start taking notes for that day in the file.

  • Pro - easy to use, chronological view easily available
  • Con - math hard to write well, ideas spread over several days hard to view topically.


The other tool I use is a wiki. It's useful to keep data in a structured, hierarchical format that's easy to edit.

  • Pro - ideas arranged topically, can do better looking math
  • Con - must work topically from the start, no good chronological view (Yes, I know you can view changes in order, but it's not as easy to reconstruct what I was thinking on a particular day.)


I want a single tool that does both of these - that can view the notes both chronologically and by topic.
For work-flow, I want the ability to enter ideas and notes free-form, and perform categorization and annotation at a separate time.

Another additional feature that would be nice is recording other actions on the system (at least by reference).
One example is check-ins to source control.

Another possible example - What if gnuplot were to record all the commands issued (and kept a copy of all the files used?) Graphs could be re-created and played back at a later date easily.

I looked around a little bit to see if there are other applications that might meet some of these requirements (or at least be an improvement over text files and a wiki). One class of app is a note taking application. I looked at Tomboy. It does have a Latex plugin (though I could not get it to work). Latex side-rant: It's great for making nice looking mathematics, but you wind up with "equations under glass". It's pretty to look at, but you can't manipulate it further. For mathematical content, I want something with more precise semantics (Content MathML, for example)

Another class of tool is mind mapping software. I looked at Vym (View your mind) briefly, but I don't think I will like this type of software. Viewing a graph of links is nice, but not as the main working view - I prefer a wiki view (pages with links)

Sunday, August 19, 2007

Black Box Reweighting

In the last post, I wondered whether the black box reweighting method might help with the bias.


The Black Box Reweighting (BBRW) method computes weights using the observed distribution of samples, rather than the targeted distribution. (The author's website contains a link to supplementary material with further discussion.)


I ran a simple test, and the bias was slightly worse with the BBRW method, compared to standard reweighting.


The BBRW method does a fine job of solving the problem the authors intended (correct handling of a distribution that may be incompletely sampled, and lower variance of the result - the lower variance of the result was apparent in my tests), but it doesn't solve the reweighting bias issue.


The search for a method to correct reweighting bias continues...

Tuesday, August 14, 2007

Reweighting is biased

The reweighting technique, where samples from one probability distribution are adjusted with a weight factor to compute averages from a different (but usually very similar) distribution, is biased for a finite number of samples.

The details can be found here.


Question for the audience: Is this known? I don't recall seeing this mentioned any time the method has been presented.

Some possible routes to fix the problem are listed in the paper, but alas, I haven't been able to turn any of them into a useful solution yet.

Friday, August 10, 2007

VMC optimization papers

A recent comment suggested these papers on VMC optimization:


I'd also add these to the pile of papers to read and understand:




And since this post is alread about VMC optimization, some further questions:


  1. What about simultaneous geometry and parameter optimization?

  2. The focus seems to be on single geometries - what's likely more interesting in the future is a family of geometries - has anyone parameterized the VMC parameters (by the bond lengths or nuclear coordinates)?

Correct estimators in DMC without forward walking?

Computing properties other than the energy using Diffusion Monte Carlo requires forward walking to get correct answers. That may change, if this paper - Hellman-Feynman operator sampling in Diffusion Monte Carlo calculations - is correct.


Although as I look at the paper more, the technique doesn't look any easier than the forward walking implementation by Casulleras and Boronat (the paper does mention this similarity). It appears to use similar data in a different combination. I would be interested to see how the noise, bias, and stability characteristics compare.


The Hellman-Feynman theorem seems to be getting lots of attention in the QMC world recently. It's also been used to construct Zero-Variance Zero-Bias estimators (most recently example here).

Thursday, August 02, 2007

Gravitational N-body and the Art of Computational Science

A rather ambitious project to describe a gravitional N-body code and how to write it in Ruby: The Art of Computational Science


As Greg Wilson points out, something like this would be nice to have in Python.

Saturday, March 17, 2007

Efficient Global Optimization

The optimization method mentioned last post is an improvement on Efficient Global Optimization, presented in the paper Efficient Global Optimization of ExpensiveBlack-Box Functions (also available from on one of the author's websites)


Another paper listed in the references, A Taxonomy of Global Optimization Methods Based on Response Surfaces(by D. R. Jones, one of the authors on the EGO paper) is a nice overview of various optimization methods. It presents each method and gives ways in which it can be fooled into choosing the wrong point as the optimum. A fix is given, which then leads to the next method in the list.


One of the core ideas is regression based on Gaussian processes (kriging). To learn more about Gaussian processes, see www.gaussianprocess.org. The two Jones papers listed above also show how the GP based methods fit in with other basis set methods.

Friday, March 02, 2007

Useful paper for VMC optimization?

I ran across this paper, An informational approach to the global optimization of expensive-to-evaluate functions, and wondered if it might be useful for VMC optimization?


The type of problem being solved seems to fit:


  1. Function is expensive to evaluate.
  2. Function evaluation may be noisy.


Many optimization methods estimate an optimal point and evaluate the function at that point. The method described in the paper estimates the uncertainty in the knowledge of the function, and evaluates the function where it will best improve our knowledge of the function.


Some potential issues:


  • This method doesn't make use of gradient information, which may make it less competitive than methods that do use gradient information.

  • How well does it scale with the number of parameters? The authors present a method to keep the cost under control as the dimension increases, but is the method still effective then?

Friday, February 23, 2007

March meeting

This year's APS March meeting has a couple of sessions on QMC: H4: Recent Advances in quantum Monte Carlo Simulations and V21: General Theory: Computational Quantum Monte Carlo Methods.


Another session of interest (for me, anyway) is A23: Focus Session: High Pressure I - Earth and Planetary Materials


I will not be attending the March meeting this year, as my wife and I are expecting our first child in mid-March.

Thursday, January 18, 2007

Science Blogging Conference

The North Caroline Science Blogging Conference starts this weekend. It looks like it covers some introductory how-to material, and then dives into issues such as promoting public understanding of science, teaching, and how does blogging interact with research.


If you're put off by the term 'blog' being overused and overhyped, think of the conference as discussing the potential and challenges of a (relatively) new one-to-many and many-to-many communication mechanism and how that could affect the interactions between scientists and the public, and the interactions between scientists.

Wednesday, January 17, 2007

Basis sets galore

More basis sets that you can shake a stick at over at the EMSL Basis Set Exchange. (Not sure why you would want to shake a stick at basis sets, though.)